WOMBAT 2025 Tutorial

Visualising Uncertainty

Harriet Mason, Dianne Cook

Department of Econometrics and Business Statistics

Introduction to Spatial Visualisation

Citizen Scientist Data

  • There have been reports of a strange spatial pattern in the temperatures of Iowa
  • We get some citizen scientists to measure data at their home and report back
  • To maintain anonymity, we are only provided with the county of each scientist
scientistID county_name recorded_temp
#74991 Lyon County 21.1
#22780 Dubuque County 28.9
#55325 Crawford County 26.4
#46379 Allamakee County 27.1
#84259 Jones County 34.2

990 citizen scientists participated

We could just plot the data…

  • We often get spatial data in terms of longitude and latitude which we can plot directly
  • This approach is easy but lacks the contextual information that gives our plots meaning.

Spatial features objects

  • SF objects are differentiated from a tibble because of additional metata in the Coordinate reference system (CRS). Specifically:
    • Assumptions about the shape of the planet (geodetic datum)
    • Distortions we will/won’t accept when drawing the map (map projection)

Can you see the spatial trend?

Estimate the county mean

  • Visualising an estimate, such as a mean, can make trends easier to see
    • Should use the sampling distribution, but often we do not bother…
Code
# Calculate County Mean
toy_temp |> 
  group_by(county_name) |>
  summarise(temp_mean = mean(recorded_temp),
            temp_se = sd(recorded_temp)/sqrt(n()),
            n = n()) 
county_name temp_mean temp_se n
Adair County 29.7 0.907 6
Adams County 29.6 1.003 9
Allamakee County 26.3 0.550 8
Appanoose County 22.8 0.831 14
Audubon County 27.6 0.893 11

Can you see the trend now?

Common Map Visualisations

  • Usually spatial data is shown using a choropleth map
    • Choropleth maps shade an area according to an average or total
  • We can also weight according to a different variable (such as sample size)
    • e.g. Cartograms, and Bubble plots

But what if the error is worse?

  • It turns out the citizen scientists are using some pretty old tools
  • The standard error could be up to three times what we would estimate with our usual assumptions.
  • We want to see both versions of the data so we can see the impact of this measurement error
county_name temp_mean low_temp_se high_temp_se n county_geometry
Adair County 29.7 0.907 2.72 6 MULTIPOLYGON (((441130 -374...
Adams County 29.6 1.003 3.01 9 MULTIPOLYGON (((424556 -414...
Allamakee County 26.3 0.550 1.65 8 MULTIPOLYGON (((675217 -131...

Spot the difference

Approaches to Spatial Uncertainty

Visualisation goals

  • We need to look at the data and identify:
    • The spatial trend (does it exist or not)
    • The statistical strength of the spatial trend

Solution: add an axis for uncertainty

Does this work? Not really

  • Pro
    • Included uncertainty and increased transparency
  • Cons
    • High uncertainty signal still very visible
    • 2D palette is harder to read
      • Colour is not a simple 3D space
      • Using saturation hurts accessibility

Solution: blend the colours together!

Does this work? Kind of…

  • Pros
    • Included uncertainty and increased transparency
    • Removed false signals
  • Cons
    • Still have 2D Colour palette
    • Standard error at which to blend colours is made up
      • Blend at 1? 2? 4? 37?
      • Impossible to align with hypothesis testing

Solution: simulate a sample

Does this work? Almost!

  • Pros
    • Included uncertainty
    • High uncertainty interferes with reading of plot (?)
    • 1D colour palette

Making a Pixel Map with ggdibbler

Alternative software for incorperating uncertainty

  • Existing tidy data structures are not great for uncertain data
  • e.g. Vizumap
    • Makes Bivariate maps and Pixel (sample) maps
    • Package is designed specifically for uncertainty
  • Issues
    • ggplot2 flexibility is lost
      • e.g. you can only use one of three specific palettes
    • Very computationally expensive
      • A simple map can take over a minute to run
    • Need to make every component separately then combine

ggplot2 uses the grammar of graphics

It is designed to take in data

Not theoretical distributions

This is what ggdibbler is for

Basic ggdibbler Example

Code
library(ggdibbler)
toy_temp_dist |> 
  ggplot() + 
  geom_sf_sample(aes(geometry = county_geometry,
                     fill=temp_dist))

Can utilise ggplot2 flexibility

Code
ggplot(toy_temp_dist) +
  geom_sf_sample(aes(geometry=county_geometry, fill=temp_dist),  linewidth=0, n=7) +
  geom_sf(aes(geometry = county_geometry), fill=NA, linewidth=0.5, colour="white") +
  theme_minimal() +
  scale_fill_distiller(palette = "YlOrRd", direction= 1) +
  xlab("Longitude") +
  ylab("Latitude") +
  labs(fill = "Temperature") +
  ggtitle("A super cool and customised plot")

Remember, the plot is random

Code
ggplot(toy_temp_dist) +
  geom_sf_sample(aes(geometry=county_geometry, fill=temp_dist),  linewidth=0, n=7) +
  geom_sf(aes(geometry = county_geometry), fill=NA, linewidth=0.5, colour="white") +
  theme_minimal() +
  scale_fill_distiller(palette = "YlOrRd", direction= 1) +
  xlab("Longitude") +
  ylab("Latitude") +
  labs(fill = "Temperature") +
  ggtitle("A super cool and customised plot")

Exercise

Here is the code that was used to make the cartogram from earlier in the session. Can you make a ggdibbler verion of this plot?

Code
# Transform to a the crs needed to do the cartogram transformation
toy_merc <- st_transform(toy_temp_mean, 3857)
# cartogram transformation
toy_cartogram <- cartogram_cont(toy_merc, weight = "n", itermax = 5)
# Transform back to original crs 
toy_cartogram <- st_transform(toy_cartogram, st_crs(toy_temp_mean))

# Plot cartogram using ggplot2
ggplot(toy_cartogram) +
  geom_sf(aes(fill = temp_mean), linewidth = 0, alpha = 0.9) +
  theme_minimal() +
  scale_fill_distiller(palette = "YlOrRd", direction= 1) +
  xlab("Longitude") +
  ylab("Latitude") +
  labs(fill = "Temperature") +
  theme(aspect.ratio=0.7)

Solution

Code
# only change to data is distribution
toy_cartogram |>
  mutate(temp_dist = distributional::dist_normal(temp_mean, temp_se^2)) |>
  ggplot() +
  geom_sf_sample(aes(geometry=county_geometry, 
                     fill=temp_dist), linewidth=0) +
   geom_sf(aes(geometry=county_geometry), fill=NA, colour="white") +
  theme_minimal() +
  scale_fill_distiller(palette = "YlOrRd", direction= 1) +
  xlab("Longitude") +
  ylab("Latitude") +
  labs(fill = "Temperature") +
  theme(aspect.ratio=0.7)